Normal Inverse Gaussian Distribution (norminvgauss)#
The Normal Inverse Gaussian (NIG) distribution is a flexible continuous distribution on (\mathbb{R}) that can capture heavy tails and skewness.
A key intuition is that NIG is a normal mean–variance mixture:
first draw a positive random variance (V) from an inverse Gaussian distribution
then draw (X\mid V) from a normal distribution with that random variance
This mixture view explains why NIG is popular in applications (e.g. financial returns, turbulence, stochastic volatility) and also gives a clean NumPy-only sampling algorithm.
Important naming pitfall: this is not the Normal-Inverse-Gamma distribution used in Bayesian conjugacy.
Notebook roadmap#
Title & classification
Intuition & motivation
Formal definition (PDF/CDF)
Moments & properties
Parameter interpretation
Derivations ((\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood)
Sampling & simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy integration (
scipy.stats.norminvgauss)Statistical use cases
Pitfalls
Summary
import math
import numpy as np
import scipy
from scipy import special, stats
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
SEED = 7
rng = np.random.default_rng(SEED)
np.set_printoptions(precision=6, suppress=True)
print("python", __import__("sys").version.split()[0])
print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
python 3.12.9
numpy 1.26.2
scipy 1.15.0
plotly 6.5.2
Prerequisites & notation#
Prerequisites
comfort with PDFs/CDFs and expectation/variance
basic calculus (differentiation under the integral sign is helpful, but not required)
familiarity with moment generating functions (MGFs) / characteristic functions
Notation (common in the literature)
We will mainly use the ((\alpha,\beta,\delta,\mu)) parameterization:
(\alpha > 0): tail / steepness parameter
(\beta\in\mathbb{R}): asymmetry (skew) parameter, with constraint (|\beta| < \alpha)
(\delta > 0): scale parameter
(\mu\in\mathbb{R}): location parameter
Define [ \gamma = \sqrt{\alpha^2 - \beta^2} ;>; 0. ]
SciPy parameterization
SciPy exposes NIG as scipy.stats.norminvgauss(a, b, loc=0, scale=1) with constraints
(a>0) and (|b|\le a). The mapping to ((\alpha,\beta,\delta,\mu)) is:
[ a = \alpha,\delta,\qquad b = \beta,\delta,\qquad \text{loc}=\mu,\qquad \text{scale}=\delta. ]
So if you think in ((\alpha,\beta,\delta,\mu)), you pass a=alpha*delta, b=beta*delta to SciPy.
1) Title & Classification#
Name:
norminvgauss(Normal Inverse Gaussian / NIG)Type: continuous
Support: (x \in \mathbb{R})
Parameter space (((\alpha,\beta,\delta,\mu)) form): [ \alpha>0,\quad \delta>0,\quad \mu\in\mathbb{R},\quad \beta\in\mathbb{R}\ \text{with}\ |\beta|<\alpha. ]
(SciPy form: (a>0), (|b|\le a), scale>0, loc real; the boundary (|b|=a) corresponds to (\gamma=0) and is numerically delicate.)
2) Intuition & Motivation#
What this distribution models#
NIG is often used when you want a model that is:
roughly bell-shaped, but
has heavier tails than a normal distribution (more extreme events), and/or
has skewness (asymmetric left vs right tails).
A powerful intuition is the mixture representation:
[ X \mid V ;\sim; \mathcal{N}(\mu + \beta V,; V), \qquad V \sim \text{Inverse-Gaussian}\left(\nu=\tfrac{\delta}{\gamma},;\lambda=\delta^2\right), ]
where (V>0) is random.
Randomizing the variance creates heavy tails.
The term (\beta V) randomizes the mean in a correlated way, creating skewness.
Typical real-world use cases#
Finance: log-returns (skew + heavy tails), NIG Lévy process increments
Stochastic volatility / time change models: Brownian motion evaluated at inverse-Gaussian random time
Signal processing: impulsive / heavy-tailed noise
Environmental / turbulence data: heavy-tailed fluctuations
Relations to other distributions#
Generalized hyperbolic family: NIG is a special case (with (\lambda=-\tfrac12)).
Normal + inverse Gaussian: NIG is a normal mean–variance mixture with IG mixing.
Normal limit (heuristic): for large (\alpha) (with other parameters scaled appropriately), tails become lighter and the distribution becomes closer to normal.
A practical mental model:
NIG behaves like a normal distribution whose variance (and drift) jitters randomly from sample to sample.
3) Formal Definition#
PDF (((\alpha,\beta,\delta,\mu)) parameterization)#
Let (X \sim \text{NIG}(\alpha,\beta,\delta,\mu)) with (\alpha>0), (\delta>0), and (|\beta|<\alpha). Define (\gamma = \sqrt{\alpha^2-\beta^2}).
The PDF is
[ f(x\mid\alpha,\beta,\delta,\mu)#
\frac{\alpha,\delta}{\pi,\sqrt{\delta^2 + (x-\mu)^2}} ;K_1!\left(\alpha,\sqrt{\delta^2 + (x-\mu)^2}\right) ;\exp!\left(\delta,\gamma + \beta,(x-\mu)\right), ]
where (K_1) is the modified Bessel function of the second kind of order 1.
PDF (SciPy “standardized” form)#
SciPy defines a standardized density
[ f(y; a,b) = \frac{a,K_1!\left(a\sqrt{1+y^2}\right)}{\pi,\sqrt{1+y^2}} \exp!\left(\sqrt{a^2-b^2} + b y\right), \qquad y\in\mathbb{R}. ]
and then applies a location-scale transform (y=(x-\text{loc})/\text{scale}).
CDF#
There is no simple elementary closed form for the CDF. It is defined by the integral
[ F(x) = \mathbb{P}(X\le x) = \int_{-\infty}^{x} f(t),dt, ]
and in practice it is computed numerically (e.g. scipy.stats.norminvgauss.cdf).
def nig_validate(alpha: float, beta: float, delta: float) -> None:
if not (alpha > 0):
raise ValueError("alpha must be > 0")
if not (delta > 0):
raise ValueError("delta must be > 0")
if not (abs(beta) < alpha):
raise ValueError("need |beta| < alpha so gamma = sqrt(alpha^2 - beta^2) is real")
def nig_gamma(alpha: float, beta: float) -> float:
nig_validate(alpha, beta, delta=1.0)
return float(math.sqrt(alpha * alpha - beta * beta))
def nig_logpdf(x: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
'''Log-PDF using a numerically stable Bessel-K computation.
Uses scipy.special.kve(1, z) = exp(z) * K_1(z) to avoid underflow for large z.
'''
nig_validate(alpha, beta, delta)
x = np.asarray(x, dtype=float)
xm = x - mu
s2 = delta * delta + xm * xm
s = np.sqrt(s2)
gamma = math.sqrt(alpha * alpha - beta * beta)
z = alpha * s
# log K1(z) via scaled Bessel: K1(z) = exp(-z) * kve(1,z)
log_k1 = np.log(special.kve(1.0, z)) - z
return (
math.log(alpha * delta)
- math.log(math.pi)
- np.log(s)
+ delta * gamma
+ beta * xm
+ log_k1
)
def nig_pdf(x: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
return np.exp(nig_logpdf(x, alpha, beta, delta, mu))
def to_scipy_params(alpha: float, beta: float, delta: float, mu: float) -> tuple[float, float, float, float]:
'''Map (alpha,beta,delta,mu) to SciPy's (a,b,loc,scale).'''
nig_validate(alpha, beta, delta)
a = alpha * delta
b = beta * delta
return float(a), float(b), float(mu), float(delta)
def from_scipy_params(a: float, b: float, loc: float, scale: float) -> tuple[float, float, float, float]:
'''Map SciPy's (a,b,loc,scale) to (alpha,beta,delta,mu).'''
if not (scale > 0):
raise ValueError("scale must be > 0")
alpha = a / scale
beta = b / scale
delta = scale
mu = loc
nig_validate(alpha, beta, delta)
return float(alpha), float(beta), float(delta), float(mu)
# Quick sanity check: our PDF matches SciPy's parameter mapping
from scipy.stats import norminvgauss
alpha, beta, delta, mu = 2.5, 0.8, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)
xg = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 400)
max_abs_diff = float(np.max(np.abs(rv.pdf(xg) - nig_pdf(xg, alpha, beta, delta, mu))))
max_abs_diff
7.771561172376096e-16
4) Moments & Properties#
A convenient starting point is the cumulant generating function (CGF) (K(t)=\log M_X(t)), where (M_X(t)=\mathbb{E}[e^{tX}]) is the MGF.
MGF / CGF#
For (t) in the interval where (|\beta+t|<\alpha):
[ M_X(t) = \exp\left(\mu t + \delta\big(\gamma - \sqrt{\alpha^2-(\beta+t)^2}\big)\right), \qquad \gamma = \sqrt{\alpha^2-\beta^2}. ]
Equivalently,
[ K(t) = \mu t + \delta\big(\gamma - \sqrt{\alpha^2-(\beta+t)^2}\big). ]
Domain: (t \in (-\alpha-\beta,; \alpha-\beta)).
Characteristic function#
For real (u),
[ \varphi_X(u) = \mathbb{E}[e^{iuX}] = \exp\left(i\mu u + \delta\big(\gamma - \sqrt{\alpha^2-(\beta+iu)^2}\big)\right). ]
Mean, variance, skewness, kurtosis#
Using derivatives of (K(t)) at (t=0):
Mean: (\mathbb{E}[X] = \mu + \delta,\beta/\gamma)
Variance: (\mathrm{Var}(X) = \delta,\alpha^2/\gamma^3)
Skewness: (\text{skew}(X) = \dfrac{3,\beta}{\alpha,\sqrt{\delta,\gamma}})
(Excess) kurtosis: (\text{excess kurt}(X) = \dfrac{3,(1+4\beta^2/\alpha^2)}{\delta,\gamma}) (so kurtosis (=3+) excess)
Entropy#
The differential entropy is
[ H(X) = -\mathbb{E}[\log f(X)] = -\int_{-\infty}^{\infty} f(x)\log f(x),dx. ]
There is no simple closed form in elementary functions; in practice you estimate it numerically (quadrature or Monte Carlo).
Other properties (high level)#
Infinitely divisible: NIG defines a Lévy process (useful for increments / time series).
Closure under convolution (with common (\alpha,\beta)): if (X_1\sim\text{NIG}(\alpha,\beta,\delta_1,\mu_1)) and (X_2\sim\text{NIG}(\alpha,\beta,\delta_2,\mu_2)) are independent, then (X_1+X_2\sim\text{NIG}(\alpha,\beta,\delta_1+\delta_2,\mu_1+\mu_2)).
def nig_mean(alpha: float, beta: float, delta: float, mu: float) -> float:
nig_validate(alpha, beta, delta)
gamma = math.sqrt(alpha * alpha - beta * beta)
return float(mu + delta * beta / gamma)
def nig_var(alpha: float, beta: float, delta: float) -> float:
nig_validate(alpha, beta, delta)
gamma = math.sqrt(alpha * alpha - beta * beta)
return float(delta * alpha * alpha / (gamma**3))
def nig_skew(alpha: float, beta: float, delta: float) -> float:
nig_validate(alpha, beta, delta)
gamma = math.sqrt(alpha * alpha - beta * beta)
return float(3.0 * beta / (alpha * math.sqrt(delta * gamma)))
def nig_excess_kurt(alpha: float, beta: float, delta: float) -> float:
nig_validate(alpha, beta, delta)
gamma = math.sqrt(alpha * alpha - beta * beta)
return float(3.0 * (1.0 + 4.0 * (beta * beta) / (alpha * alpha)) / (delta * gamma))
def nig_mgf(t: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
'''MGF evaluated on an array t (real); returns nan outside the domain.'''
nig_validate(alpha, beta, delta)
t = np.asarray(t, dtype=float)
gamma = math.sqrt(alpha * alpha - beta * beta)
inside = alpha * alpha - (beta + t) ** 2
out = np.full_like(t, np.nan, dtype=float)
mask = inside > 0
out[mask] = np.exp(mu * t[mask] + delta * (gamma - np.sqrt(inside[mask])))
return out
def nig_cf(u: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
'''Characteristic function for real u.'''
nig_validate(alpha, beta, delta)
u = np.asarray(u, dtype=float)
gamma = math.sqrt(alpha * alpha - beta * beta)
inner = np.sqrt(alpha * alpha - (beta + 1j * u) ** 2)
return np.exp(1j * mu * u + delta * (gamma - inner))
alpha, beta, delta, mu = 2.5, 0.8, 1.2, -0.3
{
"mean": nig_mean(alpha, beta, delta, mu),
"var": nig_var(alpha, beta, delta),
"skew": nig_skew(alpha, beta, delta),
"excess_kurt": nig_excess_kurt(alpha, beta, delta),
}
{'mean': 0.10531231768391913,
'var': 0.5644389450812155,
'skew': 0.569429411031021,
'excess_kurt': 1.4878339661647202}
# Entropy (Monte Carlo estimate): H(X) = -E[log f(X)]
# We'll reuse the NumPy-only sampler later, but SciPy's sampler works too.
from scipy.stats import norminvgauss
alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)
x = rv.rvs(size=50_000, random_state=rng)
entropy_mc = float(-np.mean(nig_logpdf(x, alpha, beta, delta, mu)))
entropy_mc
1.2272073243850334
5) Parameter Interpretation#
Think in terms of ((\alpha,\beta,\delta,\mu)):
(\mu) (location) shifts the distribution left/right.
(\delta) (scale) stretches/compresses the distribution.
(\beta) (asymmetry) controls skewness:
(\beta>0): heavier/right tail (more mass on the right)
(\beta<0): heavier/left tail
(\alpha) (tail/steepness) controls tail heaviness:
larger (\alpha) (\Rightarrow) lighter tails (closer to normal)
smaller (\alpha) (\Rightarrow) heavier tails (more extremes)
The constraint (|\beta|<\alpha) ensures (\gamma=\sqrt{\alpha^2-\beta^2}) is real.
A useful derived quantity:
(\gamma) appears in the mean/variance and in the MGF domain.
As (|\beta|\to\alpha), (\gamma\to 0) and the distribution becomes numerically unstable and extremely skew/heavy-tailed.
SciPy caution: SciPy’s a and b are scaled by (\delta) (since (a=\alpha\delta), (b=\beta\delta)).
So changing scale while holding a,b fixed changes ((\alpha,\beta)) as well.
6) Derivations#
We sketch three core derivations.
A) Expectation#
Start from the CGF:
[ K(t) = \mu t + \delta\Big(\gamma - \sqrt{\alpha^2-(\beta+t)^2}\Big). ]
Differentiate:
[ K’(t) = \mu + \delta;\frac{\beta+t}{\sqrt{\alpha^2-(\beta+t)^2}}. ]
Evaluating at (t=0) gives the mean:
[ \mathbb{E}[X] = K’(0) = \mu + \delta,\frac{\beta}{\gamma}. ]
B) Variance#
Differentiate again:
[ K’’(t) = \delta;\frac{\alpha^2}{\big(\alpha^2-(\beta+t)^2\big)^{3/2}}. ]
so
[ \mathrm{Var}(X) = K’’(0) = \delta;\frac{\alpha^2}{\gamma^3}. ]
C) Likelihood#
For i.i.d. data (x_1,\dots,x_n), the likelihood is
[ L(\alpha,\beta,\delta,\mu) = \prod_{i=1}^n f(x_i\mid\alpha,\beta,\delta,\mu), ]
and the log-likelihood is
[ \ell(\alpha,\beta,\delta,\mu) = \sum_{i=1}^n \log f(x_i\mid\alpha,\beta,\delta,\mu). ]
Because the density involves (K_1(\cdot)), the MLE does not have a simple closed form; numerical optimization is typical.
In SciPy you can use norminvgauss.fit (MLE under the loc/scale convention).
def nig_loglik(alpha: float, beta: float, delta: float, mu: float, x: np.ndarray) -> float:
return float(np.sum(nig_logpdf(x, alpha, beta, delta, mu)))
alpha, beta, delta, mu = 2.5, 0.8, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = stats.norminvgauss(a, b, loc=loc, scale=scale)
x = rv.rvs(size=2_000, random_state=rng)
ll_true = nig_loglik(alpha, beta, delta, mu, x)
# Compare to a slightly misspecified parameter (lower beta)
ll_alt = nig_loglik(alpha, beta * 0.7, delta, mu, x)
{"loglik_true": ll_true, "loglik_alt": ll_alt, "diff": ll_true - ll_alt}
{'loglik_true': -2185.389754329737,
'loglik_alt': -2212.5723866013323,
'diff': 27.18263227159514}
7) Sampling & Simulation (NumPy-only)#
The mixture representation gives a simple sampler.
Step 1: sample the inverse Gaussian mixing variable#
We use the ((\nu,\lambda)) parameterization with density
[ f(v;\nu,\lambda) = \sqrt{\frac{\lambda}{2\pi v^3}} \exp\left(-\frac{\lambda(v-\nu)^2}{2\nu^2 v}\right), \qquad v>0. ]
For NIG, we need
[ V \sim \text{IG}\left(\nu=\frac{\delta}{\gamma},;\lambda=\delta^2\right). ]
A classic exact sampler is the Michael–Schucany–Haas method.
Step 2: sample the conditional normal#
Given (V=v):
[ X\mid V=v \sim \mathcal{N}(\mu+\beta v,; v). ]
So we can generate
[ X = \mu + \beta V + \sqrt{V},Z,\qquad Z\sim\mathcal{N}(0,1). ]
Everything below uses NumPy only.
def sample_invgauss_msh(size: int, nu: float, lam: float, rng: np.random.Generator) -> np.ndarray:
'''Sample IG(nu, lam) using the Michael–Schucany–Haas method.
Parameterization: mean = nu, shape = lam.
'''
if not (nu > 0):
raise ValueError("nu must be > 0")
if not (lam > 0):
raise ValueError("lam must be > 0")
v = rng.normal(size=size)
y = v * v
nu2 = nu * nu
x = (
nu
+ (nu2 * y) / (2.0 * lam)
- (nu / (2.0 * lam)) * np.sqrt(4.0 * nu * lam * y + nu2 * y * y)
)
u = rng.uniform(size=size)
return np.where(u <= nu / (nu + x), x, nu2 / x)
def sample_nig(size: int, alpha: float, beta: float, delta: float, mu: float, rng: np.random.Generator) -> np.ndarray:
nig_validate(alpha, beta, delta)
gamma = math.sqrt(alpha * alpha - beta * beta)
nu = delta / gamma
lam = delta * delta
v = sample_invgauss_msh(size=size, nu=nu, lam=lam, rng=rng)
z = rng.normal(size=size)
return mu + beta * v + np.sqrt(v) * z
# Quick simulation check: sample moments vs formulas
alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
x = sample_nig(size=200_000, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)
{
"sample_mean": float(x.mean()),
"theory_mean": nig_mean(alpha, beta, delta, mu),
"sample_var": float(x.var(ddof=0)),
"theory_var": nig_var(alpha, beta, delta),
}
{'sample_mean': 0.13341510615123034,
'theory_mean': 0.13565855667130944,
'sample_var': 0.7147789033673897,
'theory_var': 0.7160715875654602}
8) Visualization (PDF, CDF, Monte Carlo)#
We’ll visualize:
how changing (\alpha) and (\beta) affects tail heaviness and skewness
the PDF and CDF for a chosen parameter set
a Monte Carlo histogram + empirical CDF compared to the theoretical curves
def plot_nig_pdfs(param_sets: list[dict], q_low: float = 0.001, q_high: float = 0.999) -> go.Figure:
from scipy.stats import norminvgauss
fig = go.Figure()
for ps in param_sets:
alpha, beta, delta, mu = ps["alpha"], ps["beta"], ps["delta"], ps["mu"]
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)
xs = np.linspace(rv.ppf(q_low), rv.ppf(q_high), 600)
ys = nig_pdf(xs, alpha, beta, delta, mu)
label = f"α={alpha:g}, β={beta:g}, δ={delta:g}, μ={mu:g}"
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name=label))
fig.update_layout(title="Normal Inverse Gaussian PDFs", xaxis_title="x", yaxis_title="f(x)")
return fig
param_sets = [
{"alpha": 3.0, "beta": 0.0, "delta": 1.0, "mu": 0.0}, # symmetric, lighter tails
{"alpha": 1.6, "beta": 0.0, "delta": 1.0, "mu": 0.0}, # symmetric, heavier tails
{"alpha": 2.0, "beta": 0.7, "delta": 1.0, "mu": 0.0}, # right-skew
{"alpha": 2.0, "beta": -0.7, "delta": 1.0, "mu": 0.0}, # left-skew
]
fig = plot_nig_pdfs(param_sets)
fig.show()
# PDF + CDF for one parameter set (CDF via SciPy)
from scipy.stats import norminvgauss
alpha, beta, delta, mu = 2.0, 0.7, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)
xs = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 700)
fig_pdf = go.Figure()
fig_pdf.add_trace(go.Scatter(x=xs, y=rv.pdf(xs), mode="lines", name="pdf"))
fig_pdf.update_layout(title="NIG PDF", xaxis_title="x", yaxis_title="f(x)")
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=xs, y=rv.cdf(xs), mode="lines", name="cdf"))
fig_cdf.update_layout(title="NIG CDF", xaxis_title="x", yaxis_title="F(x)")
fig_pdf.show()
fig_cdf.show()
# Monte Carlo samples vs PDF
alpha, beta, delta, mu = 2.0, 0.7, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = stats.norminvgauss(a, b, loc=loc, scale=scale)
n = 80_000
x = sample_nig(size=n, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)
xs = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 600)
fig = px.histogram(
x,
nbins=90,
histnorm="probability density",
title=f"Monte Carlo samples vs PDF (n={n:,})",
labels={"value": "x"},
)
fig.add_trace(go.Scatter(x=xs, y=rv.pdf(xs), mode="lines", name="theoretical pdf"))
fig.update_layout(yaxis_title="density")
fig.show()
# Empirical CDF vs theoretical CDF
alpha, beta, delta, mu = 2.0, 0.7, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = stats.norminvgauss(a, b, loc=loc, scale=scale)
n = 30_000
x = sample_nig(size=n, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)
xs = np.sort(x)
ys = np.arange(1, n + 1) / n
xg = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 600)
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name="empirical CDF"))
fig.add_trace(go.Scatter(x=xg, y=rv.cdf(xg), mode="lines", name="theoretical CDF"))
fig.update_layout(title="Empirical CDF vs theoretical CDF", xaxis_title="x", yaxis_title="F(x)")
fig.show()
9) SciPy Integration (scipy.stats.norminvgauss)#
SciPy provides:
pdf,logpdfcdf,ppf(numerical)rvs(sampling)fit(MLE)
Remember the mapping:
[ a = \alpha\delta,; b=\beta\delta,; \text{loc}=\mu,; \text{scale}=\delta. ]
from scipy.stats import norminvgauss
alpha, beta, delta, mu = 2.2, 0.6, 1.4, -0.1
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)
# Basic API
x0 = np.array([-2.0, 0.0, 1.0])
out = {
"pdf(x0)": rv.pdf(x0),
"cdf(x0)": rv.cdf(x0),
"mean": rv.mean(),
"var": rv.var(),
}
out
{'pdf(x0)': array([0.008404, 0.519158, 0.28318 ]),
'cdf(x0)': array([0.002845, 0.371315, 0.820179]),
'mean': 0.29686269665968856,
'var': 0.7145890817830703}
# Fit example: recover parameters from synthetic data
alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
a_true, b_true, loc_true, scale_true = to_scipy_params(alpha, beta, delta, mu)
rv_true = norminvgauss(a_true, b_true, loc=loc_true, scale=scale_true)
data = rv_true.rvs(size=3_000, random_state=rng)
a_hat, b_hat, loc_hat, scale_hat = norminvgauss.fit(data)
{
"true": (a_true, b_true, loc_true, scale_true),
"hat": (a_hat, b_hat, loc_hat, scale_hat),
}
{'true': (2.6, 0.65, -0.2, 1.3),
'hat': (3.080842815399469,
0.6938365831228384,
-0.21570215913180396,
1.4409291715704364)}
10) Statistical Use Cases#
A) Hypothesis testing / goodness-of-fit#
In practice you often ask: Do my residuals / increments look NIG?
A common approach is a goodness-of-fit test (e.g. Kolmogorov–Smirnov).
Important caution: if you estimate parameters from the data and then run a standard KS test, the p-value is no longer valid (because the null distribution changed). A simple fix is a parametric bootstrap:
fit parameters (\hat\theta)
compute the test statistic on the observed data
simulate many datasets from the fitted model
refit + recompute the statistic for each simulated dataset
estimate the p-value by the bootstrap tail probability
B) Bayesian modeling#
NIG is useful as a likelihood (or error model) when residuals are heavy-tailed and skewed.
The mixture representation introduces latent variables (V_i) such that
[ X_i \mid V_i \sim \mathcal{N}(\mu+\beta V_i,;V_i), \quad V_i \sim \text{IG}(\delta/\gamma,;\delta^2), ]
which can make Bayesian inference more tractable because the conditional model is Gaussian.
C) Generative modeling#
Because NIG is infinitely divisible, it is natural for increments. A simple generative model is a random walk with NIG innovations:
[ S_t = S_{t-1} + \varepsilon_t,\qquad \varepsilon_t \sim \text{NIG}(\alpha,\beta,\delta,\mu). ]
# A) Parametric bootstrap KS test demo (small B for speed)
# We'll generate data from a known NIG, fit it, and test goodness-of-fit.
from scipy.stats import kstest, norminvgauss
rng_local = np.random.default_rng(123)
alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
a_true, b_true, loc_true, scale_true = to_scipy_params(alpha, beta, delta, mu)
rv_true = norminvgauss(a_true, b_true, loc=loc_true, scale=scale_true)
n = 800
x = rv_true.rvs(size=n, random_state=rng_local)
# Fit the model
a_hat, b_hat, loc_hat, scale_hat = norminvgauss.fit(x)
rv_hat = norminvgauss(a_hat, b_hat, loc=loc_hat, scale=scale_hat)
# KS statistic on observed data against fitted CDF
ks_obs = kstest(x, rv_hat.cdf).statistic
# Parametric bootstrap: simulate from fitted, refit, recompute KS
B = 80
ks_boot = np.empty(B)
for i in range(B):
xb = rv_hat.rvs(size=n, random_state=rng_local)
a_b, b_b, loc_b, scale_b = norminvgauss.fit(xb)
rv_b = norminvgauss(a_b, b_b, loc=loc_b, scale=scale_b)
ks_boot[i] = kstest(xb, rv_b.cdf).statistic
p_boot = float(np.mean(ks_boot >= ks_obs))
{"ks_obs": float(ks_obs), "p_boot": p_boot, "B": B}
{'ks_obs': 0.01620866142668531, 'p_boot': 0.75, 'B': 80}
# C) Generative modeling: random walk with NIG vs normal innovations (matched mean/var)
T = 300
alpha, beta, delta, mu = 2.0, 0.5, 1.0, 0.0
eps_nig = sample_nig(size=T, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)
m = nig_mean(alpha, beta, delta, mu)
v = nig_var(alpha, beta, delta)
eps_norm = rng.normal(loc=m, scale=math.sqrt(v), size=T)
s0 = 0.0
s_nig = s0 + np.cumsum(eps_nig)
s_norm = s0 + np.cumsum(eps_norm)
fig = go.Figure()
fig.add_trace(go.Scatter(y=s_nig, mode="lines", name="NIG random walk"))
fig.add_trace(go.Scatter(y=s_norm, mode="lines", name="Normal random walk (matched mean/var)"))
fig.update_layout(title="Random walk paths", xaxis_title="t", yaxis_title="S_t")
fig.show()
# Compare tail behavior of innovations
qs = [0.001, 0.01, 0.5, 0.99, 0.999]
{
"quantiles": qs,
"nig": np.quantile(eps_nig, qs),
"normal": np.quantile(eps_norm, qs),
}
{'quantiles': [0.001, 0.01, 0.5, 0.99, 0.999],
'nig': array([-2.21934 , -1.787824, 0.211728, 2.341481, 2.964354]),
'normal': array([-2.208528, -1.544096, 0.255063, 1.975488, 2.075406])}
11) Pitfalls#
Parameter constraints: you must have (\alpha>0), (\delta>0), and (|\beta|<\alpha). Near the boundary (|\beta|\approx\alpha), numerical issues are common.
Confusing names:
norminvgauss(Normal Inverse Gaussian) is different from normal-inverse-gamma.Numerical stability:
the PDF involves (K_1(z)), which can underflow for large (z)
prefer
logpdffor inference and use scaled Bessel functions (e.g.scipy.special.kve) when implementing formulas
Fitting can be unstable:
small samples may not pin down tail parameters well
MLE may be sensitive to initialization / local optima
always validate fit quality with diagnostic plots (QQ plots, tail behavior)
12) Summary#
norminvgaussis a continuous distribution on (\mathbb{R}) that models skewed, heavy-tailed data.Its key intuition is a normal mean–variance mixture with an inverse Gaussian mixing variable.
The PDF involves a modified Bessel function (K_1); the CDF is typically computed numerically.
The CGF/MGF gives compact formulas for mean/variance/skewness/kurtosis.
Sampling is straightforward via the mixture representation and can be done with NumPy only.
SciPy’s
scipy.stats.norminvgaussprovidespdf,cdf,rvs, andfitwith a clear mapping to ((\alpha,\beta,\delta,\mu)).